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Abstract The effectiveness of any given mapping of workload to processors in a parallel system is 
dependent on the stochastic behavior of the workload. Program behavior is often characterized by a 
sequence of phases, with phase changes occurring unpredictably. During a phase, the behavior is fairly 
stable, but may become quite different during the next phase. Thus a workload assignment generated 
for one phase may hinder performance during the next phase. We consider the problem of deciding 
whether to remap a parallel computation in the face of uncertainity in remapping’s utility. Fundamen- 
tally, it is necessary to balance the expected remapping performance gain against the delay cost of 
remapping. This paper treats this problem formally by constmcting a probabilistic model of a computa- 
tion with at most two phases. We use stochastic dynamic programming to show that the remapping 
decision policy which minimizes the expected running time of the computation has an extremely sim- 
ple structure: the optimal decision at any step is followed by comparing the probability of remapping 
gain against a threshold. This theoretical result stresses the importance of detecting a phase change, and 
assessing the possibility of gain from remapping. We also empirically study the sensitivity of optimal 
performance to imprecise decision thresholds. Under a wide range of model parameter values, we find 
nearly optimal performance if remapping is chosen simply when the gain probability is high. These 
results strongly suggest that except in extreme cases, the remapping decision problem is essentially that 
of dynamically determining whether gain can be achieved by remapping after a phase change; precise 
quantification of the decision model parameters is not necessary. 


This research was supported in part by the National Aeronautics and Space Administration under NASA 
Contracts NAS1-17070 and NAS1-18107 while the first author was in residence at ICASE, Mail Stop 132C, 
NASA Langley Research Center, Hampton, VA 23665. It was also supported in part by the Virginia Center for 
Innovative Technology, while both authors were in residence at the University of Virginia, Department of Com- 
puter Science, Thornton Hall, Charlottesville, VA 22903. 



- 2 - 


1. Introduction 

An important issue in parallel processing is the assignment of workload to processors. A common 
model of this problem is to assume that a program is composed of a number of communicating 
modules, and that each module is to be assigned to a processor in the target parallel system. The 
assignment algorithm takes a global view of the system, and must consider processors’ capacity, any 
special affinity a module has for a processor (e.g. a module may require assignment to a processor with 
a floating point accelerator), module execution requirements, inter-module communication, and any 
access to files and data structures that a module may require. The assignment algorithm may simultane- 
ously assign files to different storage devices, so we will speak of the mapping of the computation, 
rather than just the module assignment. Any reasonable mapping algorithm must take into account the 
expected behavior of the mapped computation, because the efficiency of a parallel computation depends 
heavily on how well its mapping both exploits available parallelism, and minimizes the communication 
and synchronization overhead. Both of these factors are determined by the underlying stochastic 
behavior of the computation. If during run-time the anticipated behavior changes and causes a 
mismatch between mapping and behavior, performance will deteriorate. In this case, it can be desirable 
to dynamically remap the computation. Because of the complicated considerations often involved in 
task and file assignment, it may not be feasible to allow processors to move modules, files, and data 
structures around in a dynamic decentralized fashion. A global mapping (or remapping) algorithm is 
better able to consider all aspects of the assignment problem, especially if the parallel system is tightly 
coupled. 

Another type of workload assignment is sometimes employed for parallel scientific computations. 
These computations are often composed of numerical calculations at each point in a discretized spatial 
(or transformed) domain. Workload assignment in this context involves partitioning the discretized 
domain points into regions which are then mapped to processors [2]; usually the number of regions 
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equals the number of processors. A processor’s workload is then a function of the domain points in the 
region it receives. While the region partitioning is usually concerned with balancing the execution 
workload in each region, the assignment of regions to processors is mostly concerned with the com- 
munication costs incurred by the assignment. To dynamically remap the computation, we repartition the 
problem domain and assign the resulting regions to processors. Remapping might be desired if the 
number of domain points in a processor’s region changes, or if the calculations required at the domain 
points change. A good example of this phenomenon is the behavior of shock-capturing techniques in 
computational gas dynamics [4]; the behavior change occurs when a shock develops and the numerical 
technique attempts to resolve the shock features with additional grid points. Since the domain was ori- 
ginally partitioned by physical region, the additional grid points can create a workload imbalance which 
leads to a performance decline. Many other numerical techniques adaptively change the grid in 
response to solution behavior and may thus exhibit phase-like behavior, for examples see [1], 

In both of the fore-mentioned types of computation, we -can expect run-time behavior to change 
unexpectedly, and to the detriment of run-time performance. Dynamic remapping might improve per- 
formance, but remapping raises a number of issues including (1) whether to use global remapping or 
decentralized and localized remapping, (2) determining that a phase change leading to potential perfor- 
mance gain has occurred, (3) determining a new mapping and its implementation, (4) determining the 
gains and costs of remapping, (5) determining the performance loss of not remapping after a phase 
change, and (6) optimally chosing when to remap. This paper treats only the latter issue, which must 
essentially balance all costs, gains, and uncertainities involved in the decision to remap. All of the 
other issues are likely to be problem and system dependent, and are interesting research issues in their 
own right. We focus on the remapping decision problem because it brings whatever solutions are 
found for all of the other issues (aside from (1)) together in a cohesive bundle. We are furthermore 
motivated to treat the decision problem mathematically, because a mathematical model can attempt to 
abstract the salient features of many diverse remapping situations. Furthermore, our model allows us 
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identify an optimal remapping decision policy for a two-phase model. Optimal model performance 
gives us a baseline we use to study the sensitivity of model performance to the non-estimation of 
remapping’s costs and gains. The somewhat surprising conclusion of our sensitivity study is that for 
the purposes of deciding when to remap, the issues of estimating remapping’s gains and losses are 
often not issues at all. Careful use of a formal decision model in real situations would require careful 
estimation of quantities which are difficult to predict a priori, e.g. post phase change performance 
degradation, post remapping performance gain, and gain detection accuracy. Our empirical study shows 
that our model achieves nearly optimal performance by simply remapping when the likelihood of 
achieving better performance after remapping is high. This implies that careful estimation of major 
decision model parameters is not necessary. We also show that this decision heuristic is effective when 
the number of phase changes is not limited to two. Of course, the applicability of this result to real 
remapping situations depends on the degree to which the situation matches the model; nevertheless, the 
result suggests that the dynamic remapping decision problem is manageable. 

We have argued that some computations can benefit from dynamic global remapping. Most of 
the related literature does not address this particular problem. The work reported in [6], [7], [9], and 
[25] essentially presumes that jobs arrive at a central dispatcher which assigns jobs to processors. Our 
problem eschews the job arrival model, and does not allow a dynamic routing mechanism. A more 
recent body of work including [8], [13], [22], [23], [26] allows decentralized assignment decisions to 
be made dynamically. Again, our problem presumes that incremental dynamic reassignment is not 
feasible. Static and dynamic task assignment algorithms are presented in [2], [3], [5], [10], [11], [12], 
[17], [24], The dynamic assignment algorithms consider restricted classes of computations; the static 
assignment algorithms might be used in conjunction with a remapping decision policy if the statically 
assigned computations abruptly change behavior. In [16] we treat dynamic remapping of parallel com- 
putations whose behaviors change constantly, and gradually; we focus here on behavior which changes 
radically, and abruptly. This paper is an extension of earlier treatments of this problem in [14] and 
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[15]. Our approach is a variation on aspects of the broad treatment of change detection under uncer- 
tainty given in [18]. Our model modifies this analysis by using a different decision cost structure, and 
by assuming a random computation duration. Our two main contributions are (1) to apply Markov 
decision theory to the performance issue of when to reconfigure a workload distribution in a parallel 
processing environment; (2) to show empirically that nearly optimal performance can be achieved 
without quantifying important decision model parameters. 

Section 2 describes the decision model we study, and identifies important functions related to the 
remapping problem. Section 3 discusses the optimal decision policy, showing that it is a threshold pol- 
icy. The calculation and behavior of the optimal thresholds is also discussed. Section 4 reports the 
results of an empirical study which examined the relative performance of fixed-threshold decision 
heuristics. Section 5 presents our conclusions, and the Appendix treats analytic issues in detail. 

2. Problem Model 

Our application of decision process theory requires that we identify a sequence of decision points 
in time where the decision is made whether to remap. Accordingly, we assume that the computation 
of interest gives rise to a natural sequence of decision points. For example, the end of an iteration in 
an iterative numerical program is a natural decision point. Likewise, natural decision points could be 
found in an embedded real-time system which periodically calls monitoring tasks. We define a cycle to 
be the amount of computation performed between two decision points. The time required by the system 
to execute a cycle, a cycle time , is assumed to be random. While we could allow the mean cycle time 
to depend on the decision points defining the cycle, for simplicity we will assume that mean cycle 
times during a phase are identical. We do not assume that cycle time distributions are independent, nor 
identically distributed. Also for simplicity’s sake we assume that at most one phase change will occur 
during the course of the computation. We let ep be the mean cycle time achieved during the first 
phase. e B is the mean cycle time achieved after the phase change, but while the original mapping is in 
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place. Performance declines after the phase change if e B > e F . e R is the mean cycle time during the 
second phase, after a remapping. We implicitly assume that e R < e B , but make no assumptions regard- 
ing the relationship between e R and e F . 

A natural way of measuring the amount of work performed by the computation is to enumerate 
the number of cycles executed. We let N denote this quantity, and allow N to be random. We also 
assume that N is bounded from above by a constant M, and that N has a decreasing failure rate func- 
tion. The usefulness of this latter assumption is buried deeply in the proof of Lemma 1. Here we will 
simply state that a decreasing failure rate means that the quotient 

ProbjN = n) 

Prob{N >n } 

is an increasing function of n. This intuitively means that the longer the computation continues, the 
more likely it is that termination occurs at the present cycle. 

Our decision model describes two types of uncertainity in the remapping problem. We cannot be 
sure when (or if) we can gain better performance by remapping. This uncertainity is modeled by 
presuming that the occurrence of a phase change leading to potential remapping gain is random; the 
probability distribution of the cycle during which this occurs is assumed to have a constant failure rate 
<j>. The second source of uncertainity is related, but is perhaps more subtle. At a decision point we 
will employ some (problem dependent) mechanism to test for remapping gain. This mechanism might 
look for a decline in processor utilizations, or it might be able to examine what code has recently been 
executed. Based on such examinations, the mechanism can give us some indication of whether a new 
mapping is called for, but we cannot be certain that the mechanism is absolutely reliable. It might 
prematurely report the possibility of gain, or it might fail to report an existing possibility of gain. This 
type of uncertainty is modeled by assuming that every invocation of the mechanism has a probability a 
of prematurely reporting possible gain, and a probability (5 of failing to existing possible gain. 
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At every decision point in the computation, a decision policy decides whether to remap. The 
decision algorithm will consult the gain testing mechanism described above, but is distinct from that 
mechanism. The decision to remap is based on a gain probability which is calculated as a function of 
the response received from the gain testing mechanism. The probability is initially zero; if the first 
gain test detects no possibility for gain, we have evidence that there is no immediate gain from remap- 
ping. But the probability of remapping gain can no longer be zero since it is possible for a phase 
change to have occurred during the first cycle, and it is possible for remapping gain to be achieved, 
and it is possible that the test mechanism failed to detect the potential gain. The true value of the gain 
probability in this case depends on the values of <j> and p. Similar observations hold if potential gain is 
reported. Bayes’ Theorem [21] gives us a mechanism for calculating this probability. In general, let p n 
be the probability of remapping gain calculated at the nth decision point. Initially, p { = 0. Supposing 
that p n __i = p, p n is found by first calculating 


p*(p) = p + <(>(1 - P) (1) 

= (1 - <t >)p + <!>• 

p*(p) is interpreted as the probability that gain will be possible by step n, given that = p. This pro- 
bability is calculated at step n- 1, and so cannot consider the gain detection mechanism’s report at step 
n. The value of p n depends on this report, and is calculated using Bayes’ Theorem as follows. If 
potential gain is reported, p n is given by p c (p): 


Pn = P C (P ) = 


p*(pW - P) 


P Va-P) + a-p’(p))a 

Given a negative indication of potential gain, p n is defined by p c (p): 


( 2 ) 


Pn = P C (P ) = 




(3) 


p*(p>$ + (1 - p*(p)) (l - a) 

We will require one other related probability. Let q c (p ) be the probability that the gain detection 


mechanism will renort potential gain at sten n. given that D i = v. Bv conditioning on whether gain is 

.. j- x '■ o i • ^ * '• * * » 
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actually possible by step n, it is not difficult to see that 

q c (p ) = p'(pX 1 - P) + (1 - P*(py)a. (4) 

The probability of the mechanism reporting no gain at step n given p n _ x = p is simply 
q‘(p) = 1 - <f(p). 

A decision to remap incurs some explicit costs. First, there is a delay cost of calculating the 
remapping. We will furthermore suppose that after calculating the new mapping it is possible to com- 
pare its performance with that of the old mapping. If gain is possible from remapping, it will be found 
to be superior, it otherwise will not. This test can help us avoid an unnecessary and potentially costly 
implementation of the new mapping. We therefore let D T be the delay cost of calculating and testing a 
new mapping, and let D, be the delay cost of actually implementing that mapping. Table I summarizes 
our decision model definitions. 


Notation 

Definition 

n 

Decision Step Number 

N 

Random number of decision steps 

M 

Upper Bound on N 

N n 

N given N > n 

N n 

E[N n ] 

e F 

Decision Interval Pre-Gain Execution Time, Original Mapping 

e B 

Decision Interval Post-Gain Execution Time, Original Mapping 

e R 

Decision Interval Post-Gain Execution Time, New Mapping 

D'f 

Delay to Calculate and Test New Mapping 

D { 

Delay to Implement New Mapping 

a 

Gain Test False Alarm Error 

P 

Gain Test Missed Gain Error 

<1> 

Time of Gain Failure Rate Probability 

P*(P) 

Pre-Observation Probability of Gain At Next Decision Step 

P C (P ) 

Posterior Probability of Gain After Positive Gain Observation 

P c (p) 

Posterior Probability of Gain After Negative Gain Observation 

q c (p ) 

Probability of Observing Gain Next Observation 

msfsm 

Probability of Not Observing Gain Next Observation 


Table I 
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Every decision made by a remapping decision process incurs an explicit cost which reflects the 
expected cycle time of the cycle following the decision, and any remapping overhead costs. For exam- 
ple, if a phase change has occurred which allows remapping gain, but the old mapping is retained, then 
e B is the expected cycle time of the next cycle. If remapping is chosen, but no gain is yet possible, an 
overhead cost of D T is suffered, the absence of gain is discovered, and the old mapping is retained. If 
remapping is chosen when gain is achievable, then an overhead cost of D T + D t is suffered, but then 
every remaining cycle has a mean execution time of e R . The total computation execution time is the 
sum of all cycle times plus remapping overhead; an optimal decision policy minimizes the expectation 
of this sum. We see that the optimal decision policy should depend somehow on the various costs and 
gains involved in the remapping process, the remaining length of the computation, and the degree of 
our certainity that gain is possible. One way to express the inter-relationships between all of these con- 
cerns is as a stochastic dynamic programming problem. Given gain probability p at step n, let 
V(<p,n> ) denote the expected remaining execution time of the computation if we use the optimal deci- 
sion policy. In the parlance of Markov decision processes [19], we are defining <p,n> to be the state of 
the process at step n, and V(<p,n>) to be the optimal (stationary) cost function. If we choose to retain 
the old mapping at step n, the next cycle’s expected execution time is 

p*(p)e B + (1 - P*(p))e F . 

Note that this expression anticipates the possibility that the next cycle will be the first during which 
gain is possible; in this case the next cycle’s mean execution time is assumed to be e B . Let E v (<p,n > ) 
be the expected remaining execution time after step n+1, using the optimal decision policy, given a 
gain probability p at step n and retention of the mapping at step n. Then the expected execution time 
remaining after step n, achieved by keeping the old mapping now and thereafter using the optimal 
decision policy is 

C t (<p,n>) = p*(p)e B + (1 - p*(p))e F + E v (<p,n>). (5) 

We call C t the retain cost function. We similarly define C m (<p,n>), the remap cost function. 
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C m (<p,n > ) is the expected remaining execution time achieved by choosing to remap now, and 
thereafter using the optimal decision policy. By choosing to remap, we immediately incur an overhead 
cost D t . If after testing the new mapping we find that gain is possible, then all remaining cycles have 
mean execution time e R . There are an expected number N n - n + 1 of these cycles, where N n denotes 
the expected value of N given that N > n. Furthermore, we consider the decision process to have 
stopped at this point, because the single possible change has occurred. If the new mapping is found to 
be no better than the old, then the old mapping is retained, the next cycle has mean execution time 
P*(0)es + (1 - p*(0))e F , the probability of gain is set to zero, and the decision process continues. Thus 
we see that 


C m (<p,n>) = D T + p 


D, + (N n - n + l)e*j + (1 - p) p\ 0)e B + (1 - p*(0))e F + £ v (<0,n>)j. (6) 

The function £ v (< • ,n > ) appears in the definition of both C,(<-,n>) and C m (<-,n>). Since it 
describes optimal decision policy costs after step n+1, it is stated in terms of V(< ■ ,n+l>). E v (< • ,n>) 
is a function expressing expected values, taken with respect to the probability of reporting potential 
gain at step n+ 1. Given p n = p, p n+i will be equal to p c (p) if the mechanism at step n+\ reports poten- 
tial gain, this will occur with probability q c (p). Similar observations apply in the event that no change 
is observed. Thus we see that 


E v (<p,n>) = q c (p)V(<p c (p) ,n+ 1 >) + q c (p)V(<p z (p),n+\>). 

Now the principle of optimality states that 

J C m (<p,n > ) 

V(<p,n>) = min | Ci<p n>) . 

and that the optimal decision at step n given p„ = p is the decision whose cost function minimizes the 
right hand side of the equation above. C m (<-,n>) and C,(<-,n>) are both functions of V(< ■ ,n+ 1>); to 
determine the optimal decision policy we need to solve equations recursing on n. Since the number of 
decision steps is bounded above by M, we can start the solution procedure by defining 
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V(<pM+ 1>) — 0 for all p e [0,1], and then determine V(<pM > ) for all p e [0, 1]. An algorithm for 
computing V(< ■ ,n>) in terms of V(< ■ ,n+l>) is given in the Appendix. The following section shows 
that V(< ■ ,n>) has a useful structure, and that the optimal decision from state <p,n> is nicely character- 
ized. 

3. Optimal Decision Policy Thresholds 

We have noted that the value of the gain probability is a key factor in our decision process. In 
this section we show that the optimal decision policy is a threshold policy: for every decision step n 
there is a threshold 7C„e[0,l] such that the optimal decision in state <p,n> is to remap if p > n n , and 
retain if p < n n . We then show why exact calculation of the thresholds [ji n ] is computationally intract- 
able, report on the computational complexity of an approximation technique having bounded error, and 
graphically illustrate the behavior of the {%„} as a function of n. 

The following lemma provides the fundamental reasons for the optimal policy structure. Its proof 
is somewhat lengthy, and is given in the Appendix. 

LEMMA 1 : 

• For all n, C m (<p,n> ) is a linear function of p; 

• For all n, C t {<p,n > ) is a piecewise linear concave function of p; 

• There exists n 0 e [0,°°] such that for all n > Hq, C,(<p,n > ) < C m (<p,n>) for all pe. [0,1]; 

• If n < n 0 , C m (<l,n>) < C,(<l,n>). 

□ 


Consider the implications of Lemma 1. For any step n > n 0 , the retain decision cost function is 
always less than the remap decision cost function, implying that we should retain regardless of the 
value of the gain probability. In this case, the optimal decision threshold is degenerate, p n = 1. For 
n < n 0 we know that the linear remapping function is less than the concave retain function at p = 1. It 
is therefore geometrically impossible for C t (<-,n>y s functional curve to intersect C m (<-,n>)’s functional 
curve more than once, as illustrated by figure 1. If C , (<•,«>) and C m (<-,n>) intersect at k„< h then 
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C t (<p,n > ) is less than C m (<p,n>) for p e [0,7C„], and C m {<p,n>) is less than C t (<p,n > ) for p e [tc„, 1]. 
It follows that the optimal decision from state <p,n> is to retain if p < jt„, and to remap if p > n n . We 
summarize this result: 

THEOREM 1 : For every step n, there exists a n„ such that the optimal decision from <p,n> is to 
remap if and only if p > n„. 

□ 


If all the decision model parameters are known, then in theory we can solve the equations 
describing V{<p,n>) and determine each optimal threshold. In practice there are significant obstacles to 
this procedure. We may not be able to quantify the model parameters; we defer this problem to section 
4. A computational problem arises from the fact that the optimal cost functions V(< ■ ,n>) are all 
piece-wise linear. If a piece-wise linear function changes its linear description at domain point d, we 
will call d a transition point. For any piece-wise linear function g on [0, 1], let D(g) be the set of g’s 
transition points. In [14] we show that 

D(C,(< • ,n-l>)) = {<? > 0 I p c (q) = d or p c (q ) = d for some d e D(V(< ■ ,n>)}. 

This means that every transition point for V(< • ,n > ) can give rise to two distinct transition points for 

C,(< ■ ,n- 1>). Any of these transition points greater than n„_i will not appear in D(V(< ■ ,n- 1>)); 
nevertheless, we see that the number of line segments defining F(< • ,n>) essentially doubles at every 
step of the recursive solution. Then in general, an exact solution is not computationally feasible. How- 
ever, in the Appendix we describe an approximation procedure which estimates V(< • ,n>) to any 
desired degree of accuracy. Furthermore, at every step this procedure is linear in the number of transi- 
tion points, and over a broad class of approximations, it minimizes the number of transition points 
required to achieve the desired accuracy. Our computational experience with this approximation shows 
that it is quite robust. With the scale of parameter values we used, generally fewer than 200 transition 
points in the approximation bounded the error in approximating V(< • ,n>) for each n by 10~ 5 . 
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Figure 2 illustrates the behavior of the optimal thresholds for four different sets of parameter 
values. For computational simplicity, the number of cycles in the computation was assumed to be con- 
stant rather than random. In our experience, we have always seen the tendency for the optimal thres- 
holds to remain relatively constant, except for n nearest N. Note also that as the per cycle gain 
G = e B - e R increases (for fixed costs D ), the converged value of the optimal threshold decreases. 
Intuitively this is true because the smaller the gain from remapping is, the more certain we should be 
that gain is possible before choosing remapping and suffering the attendant overhead. Another ten- 
dency is that the region where ji„ = 1 increases as the remapping overhead costs D - D T + Dj increase. 
This too makes sense, because the expected overall remapping gain depends on the expected remaining 
length of the computation. If this is small, then a large remapping overhead may not be amortized, 
and it is better to simply suffer the "bad" cycle times e B until termination. 

4. Model Sensitivity 

Calculation of the the optimal decision policy requires quantification of the model parameters. 
However, some of these parameters may be difficult to estimate at run-time. For example, it is unrea- 
sonable to assume that we can accurately predict the post-gain cycle execution time means e R and e B . 
Because of this problem, we examined the deviation of run-time performance from optimal perfor- 
mance if remapping is chosen whenever p n > p, for some fixed p. This class of heuristics is suggested 
by the behavior of the optimal thresholds, as illustrated by figure 2: for most n, n n is relatively con- 
stant for most of the computation’s steps. Our experiments varied p between 0.05 and 0.95. This sec- 
tion reports the results of that study. We find that over a wide range of model parameter values, nearly 
optimal performance is achieved if p is chosen to be high, but not excessively so (e.g. p e [0.7, 0.8]). 
From this data we conclude that the estimation of remapping costs and gains is not as critical a prob- 
lem as we might otherwise suppose. We then examine the sensitivity of performance when we 
incorrectly calculate the gain probability. We still find that for limited degrees of inaccuracy in our 
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estimates of a, (3, and <j>, nearly optimal performance can be achieved. From this data we conclude that 
the critical issue in the remapping decision problem is to be able to determine when some kind of gain 
might be possible by remapping. Finally, we note that the decision heuristic extends in a natural way 
to computations with potentially more than two phases, and see again that the heuristic is effective. 

Our empirical study used a simulation of the analytic model we have already discussed. For 
every set of model parameters, the optimal decision thresholds were approximated with high accuracy. 
Then for each p = wi-0.05, m = 1,2,.. .,19, 1000 simulation runs of the modeled system were per- 
formed, tabulating the run-time achieved under the optimal policy, and the policy which remaps when- 
ever the gain probability exceeds p. The run-time achieved by never remapping was also tabulated. 
For each parameter set and for both non-optimal policies, the relative difference between the non- 
optimal policy performance and the optimal policy performance was calculated. We graphically 
mapped this relative difference as a function of p. 

In doing a sensitivity study we are confronted with the problem of finding an appropriate collec- 
tion of parameter sets. It is intuitively clear that remapping can be useful only if its gains are not 
dominated by its costs; we first constrain the space of parameter values by estimating an envelope of 
parameter values where gains exceed costs. The envelope is calculated as a function of N, <j> and 
D = D T + D { as follows. First, we calculate the expected possible gain. The per cycle gain is 
G = e B - e R \ for constant N, the expected possible gain is simply 


N 

Eg = X G'W ~ O' Prob[gam first achieved at i) 

i=i 

= G Z(N - 0 <J>(1 - <I>) M 

i=\ 


= G 


N - 


1 - (1 - <D) n 
<*> 


The last step follows from uninteresting algebra which uses the fact that the time of gain distribution is 
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nearly a geometric distribution. Assuming that the remapping overhead consists only of one successful 
remapping test and implementation, the envelope of admissible gain and cost parameters is found by 
equating E G to D T + Dj. That is, for any value of G, any remapping cost up to E G is less than the 
expected possible gain. Figure 3 shows the envelope under the assumptions that N = 100, <j) = 3/100, 
and e B = 200 (which bounds G from above by 200). We will use these parameter values throughout 
our study; in addition, we assume that D T = D h and denote D = D T + D,. The dashed lines in figure 3 
identify the subsets of the envelope space where we will choose G and D T + Dj for our empirical 
study. These lines fairly well span the space of admissible cost and gain values, and should give us a 
reasonable picture of performance sensitivity throughout the envelope. Note that the dashed lines del- 
ineate the marginal functions of G and D from the three (G, D ) coordinates (50,200), (100,3000), and 
(150,9000). 

N was set to 100 in all of the simulations we report here. This seems reasonable since dynamic 
remapping is a consideration only for relatively long-lived computations, and simulations showed that 
except for small N (N < 10 ), N doesn’t affect the performance measures much. Our studies looked at 
marginal performance with respect to G, to D = D r + D r , and to gain detection accuracy. Essentially, 
we varied these parameters as much as possible through the three interior points identified in figure 3. 
The studies which focused on costs and gains assumed that a = p = 0.25. 

Each graph maps the performance of heuristic policies in relation to the optimal policy. A data 
point is generated by 1000 runs of a simulator which calculates the execution time under (1) the 
optimal policy (2) a fixed p-threshold policy, and (3) never remapping, or the NR policy. If the sum 
of execution times for policy h is hcost and the sum of execution times for the optimal policy is ocost, 
then the data point plotted for policy h is {hcost - ocost)/ocost. The independent variable in each of 
these graphs is p. The piece-wise linear curves map the relative difference between the p-threshold 
policy and optimal policy as a function of p. Each strictly horizontal curve plots the relative difference 
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between the NR and optimal policies. 

Figures 4,5, and 6 show the sensitivity to changing G through the envelope coordinates (50,200), 
(100,3000), and (150,9000) respectively. Every p-threshold policy and NR policy curve is marked with 
the G value defining that curve. On each of the graphs we notice that the relative performance of the 
NR policy deviates from the optimal sharply as we increase G. This comes as no surprise, since 
optimal performance is achieved when there is no possible gain by simply never remapping. For every 
fixed value of G, it is also interesting to compare the relative performance of the NR and p-threshold 
policies, by looking for the point of intersection between their respective curves. For example, we find 
no such intersections in figure 4, implying that within the indicated parameter range, any p-threshold 
policy will yield better performance than the NR policy. This is not the case in figures 5 and 6 where 
remapping costs are much higher. There we have marked the point of intersection for each pair of 
curves associated with a fixed value of G. Every pair of curves shown in figure 5 have a point of inter- 
section, the largest is approximately p = 0.5 and is associated with G = 50. For the range of parameter 
values given by this graph, we see that choosing p > 0.5 leads to performance better than the NR pol- 
icy. The actual performance gain over the NR policy depends strongly on the value of G. Similar con- 
clusions can be drawn from Figure 6. The curves for G = 150 intersect at approximately p = 0.65; for 
any G > 150 and any p > 0.65, we can outperform the NR policy. On the other hand, the curves for 
G = 100 do not intersect, and the NR policy outperforms every p-threshold policy. Note too that the 
NR policy in this case is nearly optimal, and that the point (100,9000) lies outside of the envelope. 
Overall, the deviation from optimality of the p-threshold policy for high p is generally less than 10%, 
and can be less than 2-3%. 

Figures 7,8 and 9 illustrate the sensitivity to changing D. We can make observations similar to 
those on changing G. As the cost increases, the difference between the optimal policy and the NR pol- 
icy decreases. The points of intersection between policy curves are again marked, and again we see 
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that choosing p > 0.65 leads to performance gains over the NR policy. Once again we see that the per- 
formance of the p-threshold policy for high p deviates only slightly from the performance of the 
optimal policy. Any non-monotonicity in the p-threshold curves are likely due to statistical fluctuation. 

Figures 10, 11, and 12 show the sensitivity to changing gain test accuracy. For simplicity we 
assumed that a = p, and varied these parameters from 0.05 to 0.5. No larger values of these parame- 
ters need be considered, since the information content of a test with error probability p is equal to that 
of a test with error probability 1 - p. Collectively, these figures show that optimal performance is rela- 
tively less sensitive to gain test accuracy than it is to remapping cost or gain. Figure 10 shows the 
sensitivity at (50,200), where we see that if a < 0.4, then every p-threshold policy achieves better per- 
formance than the NR policy. The case where a = 0.5 provides an interesting contrast, where if p is 
too high, then the NR policy is slightly better. This curious effect is understood by realizing that a test 
with 50% failure gives no information at all. The only evidence the model receives for possible gain 
is from the time of gain distribution. A high probability of gain is achieved only at steps near the end 
of the computation; but then the costs of remapping threaten to dominate the gains. This same 
phenomenon is observed when a = 0.5 in figures lib and 12b. The sensitivity at (100,3000) and 
(150,9000) is more easily discerned by considering high accuracy and low accuracy cases separately. 
Figures 11a and 12a show the high accuracy cases. Once again we illustrate the points of policy curve 
intersection, and again we see that simply choosing a relatively high value of p leads to performance 
gains over the NR policy. This is also true for low accuracy values of a at (100,3000) (with the excep- 
tion of a = 0.5) shown in figure lib, but is not the case at (150,9000). Figure 12b clearly shows that 
at (150,9000), low accuracy is bad news. Only in the case of a = 0.3 can we outperform the NR pol- 
icy, and then only marginally. At the envelope boundary we apparently need relatively high accuracy 
gain detection tests (figures 6 and 9 show that a = p = 0.25 is adequate). Again, locally non-monotone 
behavior in the p-threshold curves is likely due to statistical fluctuation. 
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We have so far assumed that the p-threshold heuristic knows the precise values of a, (3 and <J>. 
Since this is not likely to be true in practise, we examined the effect of using erroneous values for 
these parameters. Figure 13 shows the sensitivity of the p-threshold policy to faulty values of a and 
p. Once again, we calculated the relative difference from optimal performance (when a, p, and are 
precisely known). The measurements were taken when G = 100, D = 3000, and the true value of 
a = p is 0.25. A curve labeled with scale factor /denotes the performance achieved when the heuristic 
used /0.25 as a (and p). As before, the horizontal line illustrates the relative performance of the NR 
policy. Figure 13 shows that even if we assume that the gain detection mechanism yields no informa- 
tion (scale factor of 2), that we can outperform the NR policy. It is also clear though that serious 
mis-estimation of a and P leads to significant performance decline. 

Figure 14 looks at the sensitivity to faulty values of <J>. The measurements were taken with the 
same parameter values as assumed in figure 13. Again, each curve is labeled with a factor used to 
scale <J> up or down for the p-threshold heuristic. Here we see that the consequences of over-estimating 
<J) can be quite serious; this arises because a high value of <|> increases the gain probability, leading to 
an increase in the number of premature remapping decisions. At first glance, it seems counter-intuitive 
that strictly under-estimating <)> should be better than using an exact value of <j>. This phenomenon is 
understood by noting that the p-threshold curve with exact <|> at envelope point (100,3000) is monotone 
decreasing (at least for p < .95). The net effect of underestimating <(> is to hamper the increase in gain 
probability after positive gain test results, meaning that more post-gain cycles will occur before the 
gain probability is high enough to exceed the chosen threshold. For exact 4>, this same effect is 
obtained by increasing the threshold value. Thus, we may think of the curves obtained by underes- 
timating 4> as left-moving translations of the exact <j) curve. Since the exact <j) curve is monotone 
decreasing, the p-threshold policy which underestimates <(> will perform better than the p-threshold pol- 
icy which uses an exact value of <(>. If <!> is unknown, it appears to be best to adopt a low value of <(>, 
and let the statistical weight of successive positive gain tests drive the gain probability up. 
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As a final note, all of the studies we have reported are for the two phase model. It is likely that 
the optimal policy for a multiple (more than 2) phase model will have a threshold structure, but in 
view of the limited utility and added complexity of determining the optimal policy, we will not attempt 
to prove this. We did however compare a p-threshold heuristic with the NR policy on a multi-phase 
model. At every phase change, the model increased the non-remapped cycle execution time by 50 
units, and increased the remapping gain by 25 units. We used p = 0.75, <)> was set to 0.05, N was set 
to 250, and we assumed that a = |3 = 0.25. The results were quite encouraging. Starting at the 
envelope point (150,9000) and e B = 200, the 0.75-threshold policy outperformed the NR policy by a 
relative percentage of 66%. From (100,3000) the 0.75-threshold policy outperformed the NR policy by 
a relative percentage 202%. It is clear then that for multi-phase computations, a simple p-threshold 
policy can significantly improve performance. 

We can draw several important conclusions from our sensitivity study, p-threshold policies work 
quite well when p is relatively high. In these cases, a p-threshold policy will almost always outperform 
the NR policy; furthermore, we can expect its performance to be within a few percentage points of the 
optimal policy. It is clear that we need to be careful about extremely high remapping costs, and inac- 
curate gain detection mechanisms. To protect ourselves from an inaccurate gain detection mechanism 
we can choose p so that it is not too high. Apparently a value of p e [0.7, 0.8] is high enough to pro- 
tect against cost/gain imbalance, and low enough to protect against inaccurate gain detection (be 
warned, however, that this latter protection depends on <J>). The most important implication of all this is 
that accurate estimation of per cycle remapping gain G is not necessary, nearly optimal performance 
can be achieved with a fixed threshold policy. We should try to determine whether the relationship 
between the per cycle gain G and remapping costs D allows remapping, but mapping costs can be 
surprising high and still allow remapping gain. For example, the ratio of remapping cost to per cycle 
gain with a gain of 150 and cost of 9000 is 60. In general, the largest permissible such ratio depends 
on N and <(>; but with a relatively long computation, and probability of phase change exceeding 1/2, we 



- 20 - 


can expect a high ratio. We have also seen that a certain degree of inaccuracy in estimating oc, p and <j> 
can be tolerated. This is important, as these parameters too are not likely to be known exactly. Finally, 
we saw that it is reasonable to expect good performance from threshold policies used on computations 
with multiple phase transitions, provided that the phases are relatively long-lived. 

5. Conclusions 

An effective mapping of workload to processors in a parallel processing system must make cer- 
tain assumptions about the computation’s running behavior. The behavior of many computations are 
characterized as a sequence of phases, where behavior within a phase is fairly stable, but the behavior 
between two phases can be quite different. It is therefore possible for a mapping to become ineffective 
when a phase change occurs, so that dynamically remapping the computation may be required to main- 
tain good performance. However the decision to remap must take into account the performance gains 
and costs involved, and must deal with uncertainity in whether remapping leads to gains. We have 
modeled this decision problem with a Markov decision process, and have determined the structure of 
the optimal decision policy. While this is an interesting theoretical result, it is not immediately practi- 
cal. We therefore empirically studied the performance of a simple threshold heuristic which do not 
assume knowledge of remapping’s costs and gains. We found that this heuristic works remarkablely 
well, implying that the remapping decision problem does not require precise estimates of these parame- 
ters. The key issue for the remapping decision problem is thus the relatively accurate assessment of 
when remapping leads to performance gains. 
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Appendix 

In this appendix we prove Lemma 1 from section 3, and discuss an algorithm for approximating 
V(< ■ ,n>). We first prove Lemma 1. 

Proof of Lemma 1 

Some of our analysis conditions on the value of N. We use the notation g(<p,n>\N=m) to denote 
the value of function g at state <p,n> given that N = m. We will also say that a function g is plcc if it 
is piece-wise linear, continuous, and concave. 

Lemma l’s first claim is that for every n, C m (<p,n> ) is a linear function of p. This is easily seen 
from its definition in equation (6); note that E v (<0,n>) does not depend on p. Lemma l’s second claim 
is that for every n, C m (<p,n>) is a plcc function of p. This result follows primarily from the following 
lemma reported in [18] and stated in terms of our notation: 

LEMMA A-l : Suppose that N = m. If V(<p,n+\>\N=m) is a plcc function of p, then 
E v (<p,n>lN=m) is a plcc function of p. 

□ 

We use this lemma to establish Lemma 1 *s second claim, by showing that for every fixed n> 0, 
V(<p,n> ) and C t (<p,n>) are plcc functions of p. First condition on N = m for some m. We will induc- 
tively show that V(<p,n>\N=m ) and C t (<p,n>\N=m ) are plcc functions of p. For the base case we con- 
sider n = m. For any p e [0,1], 

C t (<p,m>\N=m) = p*(p)e B + (1 - p*(p))e F + E v (<p/n>\N=m) 

= P*(p)e B + (1 - P*(p))e F 

since V(<pjn+ l>\N=m) = 0 for all p. Since p*(p) is linear in p, C,(<p/n>\N=m ) is also linear in p. We 
also observe that C m (<pjn>\N=m) is plcc since it is linear. The class of plcc functions is closed under 
the point-wise minimum operation; V(<pjn>\N=m) must also be plcc, establishing the induction base. 
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For the induction hypothesis we suppose that both C,(<p,n+l>\N=m) and V(<p,n+l>\N=m) are 
plcc functions of p for some n < m - 1. Lemma A-l, and the closure of plcc functions under addition 
and point-wise minimum again ensure that C t (<p,n>) and V(<p,n> ) are plcc functions of p, completing 
the induction. 

To complete the proof, we note that the class of plcc functions is also closed under scalar multi- 
plication, and observe that 

M 

V(<p,n>) = £ Prob{N = m} V(<p,n>\N=m) 
m = o 

and 

M 

C,(<p,n>) - £ Prob[N = m}C,(<p,n>\N=m) 

m = 0 

To help establish Lemma l’s third and fourth claims, we analyze the values of C m (<p,n>) and 
C t (<p,n>) at p = 1. Key results are given by Lemma A-2. 

LEMMA A-2 : Either 

(i) V(<l,n>) = C m (<l,n>) for all n for which Prob{N = /i}*0; or 

(ii) V(<l,n>) = C,(<l,/i>) for all n for which Prob{N = rt}*0; or 

(iii) There exists an n 0 (possibly °°) such that for all n < riQ, V(<l,n>) = C m (<l,n>), and for all n > Hq 
for which Prob{N = n}* 0, V(<l,n>) = C t (<l,n>). 

PROOF: We condition on N = m, for any 0 <m<M. Let A - be the largest integer such that 
(e F - e R ) K < D t + D h Simple algebra (omitted here) establishes the inductive proof that for all n such 
that m - K < n < m, 

V(<l,n>\N=m) = C t (<\,n>\N=m) = (m - n + l)e B ; 
and that for 0 < n < m - K, 


V(<l,n>\N=m) = C m (<l,n>\N=m) = D T + D { + (m - n + \)e R 
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and 


C,(<\,n>\N=m) = e B + D T + Dj + (m - n)e R . 

Define d(n\N=m) to be the conditional difference C m (<l,n>\N=m) - C t (<\,n>\N=m), and d(n) to be the 
unconditional difference C m (<l,n>) - C t (<l,n>). From the equations above, we see that as a function 
of m, 

) D T + D[- ( e B - e R ) (m - n + 1) for m < n + K 
(e R -e B ) for n + K< m 

It follows from the definition of K that d(n\N=m) is a decreasing function of m. The unconditional 
difference d(n) is obtained by taking the expectation of d(n\N=m) with respect to the residual distribu- 
tion N n of N given N > n. In [20] it is shown that if N has a decreasing failure rate function, then 
E[g(N n )] < E[g(N n+ 1 )] for all decreasing functions g. In particular, d(n) < d(n+l), showing that the 
difference C m (<l,«>) - C,(<l,n>) is an increasing function of n. Then case (i) occurs if d(n) is nega- 
tive for all n, case (ii) occurs if d(n) is positive for all n, and case (iii) occurs if d(n) changes sign at 
n = n 0 . 

□ 


These results establish Lemma l’s fourth claim, that if n < n$, then C ( (<l,n>) > C m (<l,/z>). Finally, to 
establish its third claim, we will show that C,(<p,n> ) is linear in p whenever n > n 0 . Since C m (<-,/i>) is 
always linear, and C,(<0,n>) < C m (<0,n>) for all n, and C,(<l,n>) < C m (<l,«>) when n > n 0 , it will 
follow directly that C, and C m cannot intersect, so that C t {<p,n > ) < C m (<p,n> ) for all p e [0, 1]. 

LEMMA A-3 : If n > n 0 , then C,(<p,n> ) is linear in p, and V(<p,n>) = C,(<p,n> ) for all p e [0,1]. 

PROOF: We proceed by induction. M is the largest integer such that Prob{N = M } *0, so that 
V(<pM+ 1>) = 0 for all p and 
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V{<pM>) = min 1 


Dj + p(e R + D r ) + (1 - p)(p*(0)e B + (1 - p*(0))e F ) 
P*(p)e B + (1 - p*(p))e F 


Presuming that no <M, we have C t (<lM>) ^ C m (<lM>) and C r (<0,M>) < C m (<OM>), so that 
V(<pM>) = C,(<pM>) = p*(p)e B + (1 - p*(p))e F , which is linear in p. The induction base is thus 
satisfied. 


For the induction hypothesis, we suppose there is an n > hq such that V(<p,n+ 1>) = C t (<p,n+\>) for all 
p g [0,1], and that C t (<p,n+ 1>) is linear in p. Equation (5) implies that 

C,(<p,n > ) =p*(p)e B + (1 - p*(p))e F + q c (p)V(<p c (p),n+ 1>) + (f(p)V{<p T (p),n+\>), 
and the induction hypothesis states that 


V(<p,n+ 1>) = Ap + B 

for some A and B. Equations (2), (3), and (4) show that that p c (p) = & — P) and that 

<I C (P) 

p z (p) = ? ; it follows that 

f(p) 

C t (<p,n>) = p*(p)e B + (1 - p*(p))e F + Ap*(p ) + B 

which is linear in p since p* (p) is linear in p. Since C m (<p,n> ) and C t (<p,n> ) cannot intersect it fol- 
lows directly that C m (<p,n> ) exceeds C,(<p,n> ) for all p g [0,1]. Thus V(<p,n> ) = C,(<p,n>), complet- 
ing the induction. 

□ 


Calculating/Approximating V(< ■ ,n>) 

We now discuss an algorithm for calculating or approximating the functions V(< • ,n>). The 
proofs for various properties we claim for this algorithm are detailed in [14]. 
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To deal with the fact that N can be random, we will condition on N = m for some m. The pro- 
cedure we describe is then repeated for all m such that Prob{N = m} * 0, and then the conditional 
functions are combined. 

We know that V(< • ,n>W=m) = C t (< ■ ,n>\N=m) is a linear function for n > % It is a simple 
matter to define V(<pjn+l>\N=m) = 0 for all p, and then determine the slope and intercept of C, and 
C m using equations (5) and (6). Since C,(<0,n>\N=m) is less than C m (<0,n>\N=m) for all n, we test for 
an intersection of C, and C m by checking to see if C t (<\,n>\N=m) > C m (<l,n>\N=m). If 
C,(<pjn>\N=m) and C m (<p,m>\N=m) do not intersect, we infer that V(<pjn>\N=m) = C t (<pjn>\N=m), 
and calculate the slope and intercepts for C,(< ■ ,m—l>\N=m) and C m (< • ,m-\>\N=m)\ again, examin- 
ing the functional values at p = 1 will determine whether intersection has occurred. We repeat this 
process until the two cost curves intersect. 

Now we assume that V(< • ,n>\N=m) or some approximation to V(< • ,n>\N=m) is known. A con- 
venient representation for this function is as an array of records where each record holds a transition 
point, the value of V at that point, and the slope and intercept of the linear segment extending to the 
value of the function at the next greatest transition point. This array is sorted by the transition point 
values. The set of transition points for C,(< • ,n-l>\N=m) is found by defining the sets 

Sj = [q > 0 I p c (q) = d for some d e D(V(< • ,«>))} 
and 

S 2 = {<7 > 0 I p c (q) = d for some d e D(V(< ■ ,«>))} 

Since both p c (p ) and p c (p) are increasing functions of p (when a, (3 < 0.5), sorted representations of Si 

and S 2 are easily obtained by simply choosing the d's for solution in p c (q) = d and p c (p) = d in sorted 

order. The ordered lists for Si and S 2 are then merged (removing identical entries if necessary) into a 

sorted list for D(C t (< ■ ,/i-l>IN=m)). The function values for C, at these points are then determined by 

equation (6). For any e e D(C,(< ■ ,n-l>\N=m)) the calculation of C,(<e,n-\>\N=m) requires 
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identification of V(<p c (e),n>\N=m) and V(<p c (e),n>\N=m). This is efficiently done if we start the C, 
value calculation procedure by first placing a "p c ( - )-pointer" and a "p\ ■ )-pointer" to the head of 
the V(< ■ ,n>\N-m) array, and then process the points in D(C,(< ■ ,n-l>\N=m)) in sorted order. To 
determine V(<pc(e),n>\N=m) we simply advance the p c ( • ) pointer until the appropriate linear segment 
of V(< ■ ,n>\N=m) is encountered. Since p c (p) is increasing in p, the fact that the points in 
D(C,(< ■ ,n-l>\N=m)) are processed in sorted order means that we never have to back the p c { ■ ) 
pointer up: the search for V(<p c (e),n>,\N=m) can start where the pointer was last left. Similar observa- 
tions hold for the p c ( ■ ) pointer. Since C m (< • ,n-\>W=m) is linear, it is easy to compare 
C m (<e,n-l>\N=m) and C t (<e,n-l>\N=m) for every C, transition point e. When we encounter the first e 
such that C m (<e ,n-\> \N=m) < C,(<e,n-\>\N=m), we know that C, and C m intersect in the interval 
between e and the greatest transition point less than e. It is then straightforward to calculate the point 
of intersection, so that the complete set of transition points for V(< ■ ,n-\>\N=m) are defined. The 
slopes and intercepts for each of the V’s linear segments are then calculated. Each of the steps out- 
lined above has linear complexity in twice the size of D(V(< ■ ,n>\N=m)). 

The procedure outlined so far will calculate the optimal cost function exactly. But because the 
number of transition points essentially double at every step, the computational complexity and storage 
requirements of the algorithm are prohibitive. We next describe a method of constructing an accurate 
approximation to V(< • ,n>\N=m) which significantly reduces the number of transition points required 
to represent it. We assume that V(< ■ ,n>\N=m) or an approximation to it is known, and is denoted by 
W. We assume that W is piece-wise linear and concave. We also assume an error tolerance e, and 
desire an approximation to W which bounds the maximal error between W and its approximation, 
which minimizes the number of transition points used to achieve that tolerance, and which remains 
concave. We will restrict ourselves to interior approximations defined as follows. If A is an interior 
approximation to W, then A(e) = W(e) at each of A ’s transition points, A is piece-wise linear, and 
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A(p) < W(p ) for all p. If W is concave then A will be concave. Our approximation procedure finds an 
interior approximation to W which minimizes over all interior approximations the number of transition 
points needed to achieve W(p) - A(p ) < e for all p. 

The basic idea of the approximation is to build up its line segments piece by piece. Starting with 
the left most endpoint of the interval (p = 0), the rightmost endpoint of A's first segment is chosen so 
that the maximal difference between A and W over that interval is exactly e. The chosen endpoint is 
then taken as the left endpoint of the next approximation segment, and again the right endpoint is 
chosen so that the maximal error of the second approximation segment is exactly e. This process is 
repeated until W is completely approximated. The number of transition points chosen using this 
method minimizes the number of transition points needed by an interior approximation to achieve an 
error tolerance of e. Furthermore, the complexity of approximating W by A is linear in the number of 
VF’s transition points. We now outline this approximation in more detail. 

Suppose that / is the left endpoint of the interval for A that we are attempting to construct, and 
consider the interior approximation to W consisting of a single linear segment between W(f) and W(u), 
u > l. To emphasis the right endpoint, we will call this segment the u-segment. The point at which the 
difference between W and the u-segment is maximized over [/, u] is the right hand endpoint associated 
with W’s linear segment having the least slope greater than the u-segment slope, (W(u) - W([))/(u - [). 
Furthermore, the maximal error between the u-segment and W over [/, u] is a continuous increasing 
function of u. Thus, given /, we can calculate the maximal error over [/, rf,], for any of W's transition 
points d i > /, the d t being examined in increasing sorted order. Upon finding the first dj such that the 
maximal error between W and the rf y -segment exceeds e, we can calculate the point u', d^\ <u'< dj 
such that the u'-segment’s maximal error over [/, u'] is exactly e, as follows. Let m and b be the slope 
and intercept of the W segment between W{dj_ x ) and W(dj). For u e [dj_\,dj\ the slope of the u-segment 
is 
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and its intercept is 


m(u) = 


m u + b - W([) 
u-l 


b(u) = W ( 0 - m(u) l. 

At any point e e [/, u\, the error between W and the «-segment is given by 

W(e) - \m(u) e + b(u)^. (7) 

Let e(u) be the W transition point at which the w-segment’s maximal error over [/, u ] occurs. For 

u e [dj_ x , dj ] e(u) is an increasing step function; let u x , . . . ,Uk be the ordered sequence of points 

where e(u ) is discontinuous, and u x = dh i- It will be understood that e(u-) denotes the value of e( ■ ) 

as u, is approached from the right. Given e(u$, it is not difficult to identify u i+x . We know that e( • ) 

takes a step up precisely when the slope of the m- segment is equal to the slope of the W segment 

between e { and e(u i+l )\ since this slope m, is known from W's description, and it is known that u- 

segment passes through (/, W(l)) regardless of u, it is a simple matter to calculate where a line with 

slope mi which passes through (/, W([)) will intersect the W segment between W(dj-\) and W(dj). The 

point of intersection will be u i+1 . 

e(u x ) was identified when the interval [dj_ x , dj ] was identified, so that eiu?) is the next larger W 
transition point, and u 2 can be identified as outlined above. Now we hypothesize that the u which 
leads to e maximum error is in the interval [rf y _ 1 ,« 2 ], so that the maximum error occurs at e{u x ). Fol- 
lowing expression (7), we solve for u in the equation 

e = W(e(u x )) - [ m(,u) e(u{) + b(u) j. 

If the solution u' is less than w 2 , then we select u' as the approximation endpoint. Otherwise, we 
hypothesize that the endpoint sought is in the interval [m 2 , m 3 ], and solve for u in the equation 

e = W(e(u 2 )) - | m(u) e(u 2 ) + 6(w)j. 

Again, e(« 3 ) is the next larger transition point from e(u 2 ), and if the solution u' to the equation is less 
than m 3 we are done. This process is repeated until the u' which causes the maximal error over [/, u'] 



-29- 


to be exactly £ is found, u' then defines the right endpoint of the A segment under construction, u' is 
then used as the left endpoint for the next segment, found using this same procedure. Because every 
one of W’s transition points is scanned at most once in looking for a bounding interval on u\ and at 
most once in narrowing the search for «' in that bounding interval, the complexity of approximating W 
with A is linear in the number of W’s transition points. 

If W is an approximation to V(< ■ ,n>\N=m) whose error is no larger than 8, and if W’s approxi- 
mation A has error no larger than e, then A is an approximation to V(< • ,n>\N=m) with error no larger 
than 8 + e. Thus we can bound the error on V(< • ,1>I/V=m) by e if we construct approximations to 
each V(< ■ ,n>\N=m) with tolerance zJm. In practice, we achieve somewhat better accuracy by calling 
upon the approximation technique only when the number of transition points for the working level of 
T(< • ,n>\N=m) gets large. The approximation typically reduces the number of transition points 
significantly, allowing us to do an exact mapping until the number of points again grows too large. 

The procedures discussed above initially condition on N = m. We suppose then that 
C,(< ■ ,n>\N=m) and V(< ■ ,n>\N=m) have been calculated or approximated for every m such that 
Prob{N = m) *0. Combining the conditional functions is relatively straightforward. The transition 
points for C,(< ■ ,n>) are found by merging the transition points for each conditional function 
C,(< • ,n>W-m). Then for every e e D(Ct(< ■ ,n>)), we calculate the function value 

M 

C,(<e,n>) = £ Prob{N = m}C t (<e,n>\N=m). 

m=l 

Note that if the approximation to C t (<e,n>\N=m) has maximal error of e, then Prob{N = m } times that 
approximation deviates from the true product Prob[N = m}C,(<e,n>\N=m) by no more than 
e-Prob{N = m). It follows that if for every m we approximate C,(<e,n>\N=m) with tolerance e, then the 
sum above has tolerance e. 

Since C m (< • ,n>\N=m) is linear for every m, C m (< • ,«>) is also linear, its slope and intercept can 
be found by evaluating C m (<0,n>) and C m (<l,n>) as above. The intersection of C m (< ■ ,n>) and 
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C t (< ■ ,n>) can then be determined, and tc„ discovered. 
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Fig. 9 Sensitivity to D When G = 150 



Relative 
Difference 
From 0.4- 
Optimal 



P 

Fig. 10 Sensitivity to a, (3 When G = 50, D = 200 





- 43 - 


Relative 

Difference 

From 

Optimal 



P 



Fig. 11 Sensitivity to a, (3 When G = 100, D - 3000 









- 45 - 



P 

Fig. 13 Sensitivity to Mis-estimation of a, {3 





6 - 






Standard Bibliographic Page 


l. Report No. NASA CR-178174 
ICASE Report No. 86-58 


4. Title and Subtitle 


2. Government Accession No 


DYNAMIC REMAPPING DECISIONS IN MULTI-PHASE 
PARALLEL COMPUTATIONS 


7. Author(s) 

David M. Nicol and Paul F. Reynolds, Jr. 


9. Performing Organization Name and Address 

Institute for Computer Applications in Science 
and Engineering 

Mail Stop 132C, NASA Langley Research Center 
Hampton, VA 23665-5225 


12. Sponsoring Agency Name and Address 


3. Recipient's Catalog No. 


5. Report Date 


6. Performing Organization Code 


8. Performing Organization Report No. 




10. Work Unit No. 


11. Contract or Grant No. 

NAS1-17070 , NAS1-18107 


13. Type of Report and Period Covered 






14. Sponsoring Agency Code 




National Aeronautics and Space Administration 
Washington, D.C. 20546 


15. Supplementary Notes 

Langley Technical Monitor: Submitted to IEEE Trans. 

J. C. South on Computers 

Final Report 


16. Abstract 

The effectiveness of any given mapping of workload to processors in a parallel system is 
dependent on the stochastic behavior of the workload. Program behavior is often characterized by a 
sequence of phases, with phase changes occurring unpredictably. During a phase, the behavior is fairly 
stable, but may become quite different during the next phase. Thus a workload assignment generated 
for one phase may hinder performance during the next phase. We consider the problem of deciding 
whether to remap a parallel computation in the face of uncertainity in remapping’s utility. Fundamen- 
tally, it is necessary to balance the expected remapping performance gain against the delay cost of 
remapping. This paper treats this problem formally by constructing a probabilistic model of a computa- 
tion with at most two phases. We use stochastic dynamic programming to show that the remapping 
decision policy which minimizes the expected running time of the computation has an extremely sim- 
ple structure: the optimal decision at any step is followed by comparing the probability of remapping 
gain against a threshold. This theoretical result stresses the importance of detecting a phase change, and 
assessing the possibility of gain from remapping. We also empirically study the sensitivity of optimal 
performance to imprecise decision thresholds. Under a wide range of model parameter values, we find 
nearly optimal performance if remapping is chosen simply when the gain probability is high. These 
results strongly suggest that except in extreme cases, the remapping decision problem is essentially that 
of dynamically determining whether gain can be achieved by remapping after a phase change; precise 
quantification of the decision model parameters is not necessary. 


17. Key Word? (Sup pc'- ted by Authors(s)} 

parallel processing, multi-processors, 
load balancing 


18 rv.:-’. run.*. i>:. St airmen* 

61 - Computer Programming and 
Software 

66 - Systems Analysis 
Unclassified - unlimited 


j 19 Security Classif.(of this report) 
Hn/»1 aeo-l f 4 o/1 


20. Security Classif. (of this page) 21. No. of Pages 22. Price 
TTn /^1 ooo< £4 oH in *Ari 


For sale by the National Technical Information Service, Springfield. Virginia 22161 


NASA Langley Form 63 i.'uo* 1985) 










